Configure Log Parsing Rules
In full log data, you can write logs that meet specific conditions into different log libraries by adding log libraries. At the same time, by configuring log parsing rules, you can structure logs of specific formats and speed up query performance. This document will guide you on how to configure log parsing rules.
1. Determine the Application Log Format
Taking Java log format as an example:
2024-10-17 12:03:18.971 [http-nio-12346-exec-17] INFO travel.service.TravelServiceImpl - [getRestTicketNumber][Get Rest tickets num][num is: Response(status=1, msg=Get Left Ticket of Internal Success, data=1073741823)]
2. Write Parsing Rules Based on the Log Format
In APO log parsing rule configuration, the capture group names in the regular expression will be extracted as log field columns. Below is an example of a parsing rule that matches the above log format:
.msg, err = parse_regex(.content, r'(?P<ts>.*?) \[(?P<thread>.*?)\] (?P<level>.*?) (?P<method>.*?) - (?P<msg>.*)')
if err == null {
.content = encode_json(.msg)
}
del(.msg)
Here, the parse_regex
part is the regular expression:
(?P<ts>.*?) \[(?P<thread>.*?)\] (?P<level>.*?) (?P<method>.*?) - (?P<msg>.*)
The above regular expression uses named capture group syntax to convert log content into structured tags such as ts
, thread
, level
, method
, and msg
. The labels of the named capture groups will appear in the "Log Fields" and correspond to individual columns in the ClickHouse log library, speeding up query performance.
Note:
ts
andmsg
are two special labels that will not be processed as columns. Please only name "log time" and "log message" asts
andmsg
.
How to Modify the Expression Based on Requirements
Suppose you need to extract [getRestTicketNumber]
after the -
symbol in the example log content as a caller method label. You can add \[?(P<caller>).*?\]
to the regular expression, resulting in the following:
(?P<ts>.*?) \[(?P<thread>.*?)\] (?P<level>.*?) (?P<method>.*?) - \[?(P<caller>).*?\](?P<msg>.*)
After completing the configuration rule, the capture group format \[?(P<caller>).*?\]
will extract the log field column caller
, speeding up filtering and query performance. The structured label "caller":"getRestTicketNumber"
will then appear in the final log.
Extended Explanation: Capture Group Expressions
In regular expressions,
(?P<name>...)
is a syntax used for "named capture groups." Thename
within the parentheses is the name of the capture group, allowing you to reference the captured content by name rather than position. APO uses named capture groups as part of the log parsing rules.
3. Verify the Rule's Correctness
Please enter the log parsing rule in VRL Playground and follow the illustrated process to verify the rule's correctness.